Identifying Non-Referential It: A Machine Learning Approach Incorporating Linguistically Motivated Patterns

نویسندگان

  • Adriane Boyd
  • Whitney Gegg-Harrison
  • Donna K. Byron
چکیده

In this paper, we present a machine learning system for identifying non-referential it. Types of non-referential it are examined to determine relevant linguistic patterns. The patterns are incorporated as features in a machine learning system which performs a binary classification of it as referential or non-referential in a POS-tagged corpus. The selection of relevant, generalized patterns leads to a significant improvement in performance.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Elliphant: A Machine Learning Method for Identifying Subject Ellipsis and Impersonal Constructions in Spanish

This thesis presents Elliphant, a machine learning system for classifying Spanish subject ellipsis as either referential or non-referential. Linguistically motivated features are incorporated in a system which performs a ternary classification: verbs with explicit subjects, verbs with omitted but referential subjects (zero pronouns), and verbs with no subject (impersonal constructions). To the ...

متن کامل

Correcting Grammatical Verb Errors

Verb errors are some of the most common mistakes made by non-native writers of English but some of the least studied. The reason is that dealing with verb errors requires a new paradigm; essentially all research done on correcting grammatical errors assumes a closed set of triggers – e.g., correcting the use of prepositions or articles – but identifying mistakes in verbs necessitates identifyin...

متن کامل

Methods and Tools of Computational Linguistics for the Classification of Natural Non-referential Ellipsis in Spanish (review)

Vera Danilova Abstract: This article represents a brief survey of the few works, dedicated to the modern approaches of natural language processing (NLP) to the analysis of impersonal sentences in Spanish. Such an analysis consists in classification of non-referential ellipsis that can be used in machine translation systems. The NLP approaches related with Spanish are mainly based on the work of...

متن کامل

A machine learning method for identifying impersonal constructions and zero pronouns in Spanish∗ Un método de aprendizaje automático para la identificación de construcciones impersonales y pronombres cero en español

In this paper, we present a machine learning system for classifying subject ellipsis in Spanish as either referential or non-referential. To the best of our knowledge, this is the first attempt to automatically identify non-referential ellipsis in Spanish. An evaluation of our system against 6,827 finite verbs shows an accuracy of 87%.

متن کامل

Learning Verb Inference Rules from Linguistically-Motivated Evidence

Learning inference relations between verbs is at the heart of many semantic applications. However, most prior work on learning such rules focused on a rather narrow set of information sources: mainly distributional similarity, and to a lesser extent manually constructed verb co-occurrence patterns. In this paper, we claim that it is imperative to utilize information from various textual scopes:...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005